title: Researchcation, concluded
date: 2014-06-03 14:00
author: Christine Lemmer-Webber
tags: researchcation, mediagoblin, kinda-mediagoblin
slug: researchcation-result
---
I just got done with a three week thing I've dubbed "researchcation".
It's exactly what it sounds like, research + vacation.

It's hard for me to take time away from MediaGoblin right now and have
it still meet its goals as a project. On the other hand, there's a lot
that we have planned for the year ahead, but some of it I'm not really
prepared enough for to make optimal decisions on. In addition, the last
year and a half really have just not given me much of a break at all,
and running a crowdfunding campaign (not to mention two over two years)
is really exhausting. (Not that I'm complaining about success!)

I was feeling pretty close to burnout, but given how much there is to
get done, I decided to take a compromise on this break... instead of
taking a full fledged vacation, I'd take a "researchcation": three weeks
to recharge my batteries and step away from the day to day of the
project. In the meanwhile of that, I'd work on some projects to prepare
me for the year ahead. A number of good things came out of it, though
not exactly the same things I expected coming in. But I think it was
worth the time invested.

My original plan going in was that I would work on two things: something
related to the Pump API and federation, and something related to
deployment. It turns out I didn't get around to the deployment part, but
working on the federation part was insightful, though not in all the
ways I anticipated. Though I've read the [Pump API
document](https://github.com/e14n/pump.io/blob/master/API.md) and helped
advise a bit on the design of
[PyPump](https://github.com/xray7224/PyPump) (not to take credit for
that, clearly credit belongs to Jessica Tallon generally, not me),
there's nothing really like having a solid project to toss you into
things, and I wanted to take a non-MediaGoblin-codebase approach to
playing around with the Pump API.

I started out by hacking on a small project called PumpBus, which was
going to be a daemon which wrapped pypump and exposed a d-bus API. I
figured this would make writing clients easier (and even make it
possible to write an emacs client... yeah I know). I got far enough to
where I was able to post a message from emacs lisp, then decided that
what I was working on just wasn't that interesting and wasn't teaching
me much more than I already knew.

Given that there was both the "research" but also the "-cation"
component to this, I figured the risks of failure were low, so I'd up
the challenge of what I was working on a bit. I instead started working
on something I've dubbed [Pydraulics](https://gitorious.org/pydraulics):
a python-powered implementation of the Pump API. Worst came to worst I'd
learn a few things.

I decided from the outset to keep a few assumptions related to
pydraulics:

-   The end goal would be to have something that provided interfaces for
    object storage and retreival... not to wrap the database itself, but
    hopefully there would not be too many views, and maybe this could
    happen on a per-view basis. This way you could easily wrap
    Pydraulics around whatever application and use the storage/database
    it's already using. That's the *end goal*. (I didn't get there ;))
-   I'd keep things simple database-wise: assuming you're not providing
    your own interface, the default interface provided is postgres +
    sqlalchemy only. There's some [new JSON-related features in
    Postgresql](http://clarkdave.net/2013/06/what-can-you-do-with-postgresql-and-json/)
    that are pretty exciting and would be appropriate here.
-   I'd use this as an oportunity to think about MediaGoblin's codebase.
    I decided I'd see how easy or hard it would be to split out
    components from MediaGoblin as I needed them into something I dubbed
    "libgoblin". For now, I'd allow this to be messy, but hopefully
    would give me a chance to think about what libgoblin should be.
-   I'd also use to think about where MediaGoblin fits as in terms of
    recent developments in asynchronous Python coding.

So, what came out of it?

-   Turns out [SQLAlchemy does a nice job of making use of Postgres'
    built-in JSON
    support](http://docs.sqlalchemy.org/en/rel_0_9/dialects/postgresql.html#sqlalchemy.dialects.postgresql.JSON).
    Early tests seemed to indicate that this choice would pay off well.
    (Left me wondering: how hard would it be for someone to write a
    python API-compatible implementation of pymongo or something?)
-   I ended up spending a lot more time on the libgoblin side of things
    than I expected. I didn't realize that MediaGoblin had become such a
    self-contained microframework until this point. I wanted to port the
    MediaGoblin OAuth views over to Pydraulics to save time, but it
    turned out this required porting over a significant amount of
    MediaGoblin code over to libgoblin. I *did* get the oauth views
    working though!
-   Asynchronous stuff turned out to be interesting to explore, and I'll
    expand on what I've been thinking below.
-   I did end up getting a much, much stronger sense of the Pump API,
    which of course was the main goal, though the implementation of that
    is not yet complete.

Pondering asynchronous coding developments and
MediaGoblin/libgoblin/pydraulics turned out to be fruitful. Mostly I
have been looking at "what would it take for libgoblin to be *usefully*
integrated into
[asyncio](https://docs.python.org/3.4/library/asyncio.html)?"

This turns out to be a bit more challenging than it appears at the
outset for one reason: mg_globals. mg_globals is a pretty sad design in
MediaGoblin that I'd like to get rid of; basically it makes it easy to
write functions that don't have to have the database session and
friends, template environment and etc passed into them, because those
are set on a global variable level. That works (but is nasty) as long as
you're not in a multithreaded environment, but breaks as soon as you
are. I recently [created a ticket reflecting
such](https://issues.mediagoblin.org/ticket/893), suggesting switching
over to [werkzeug context locals](http://werkzeug.pocoo.org/docs/local/)
(Flask makes heavy use of this). Werkzeug's hack is clever, using thread
locals so that even in a multi-threaded environment, the objects you're
accessing are still globals, but they're the *right* globals.

But Werkzeug's solution is not good enough for integration with asyncio,
where you might be doing "asynchronous" behavior in the same thread,
suspending and resuming coroutines or coming back to tasks or etc. As
such, it's almost guaranteed in this system that you'll be clobbering
the variables another task needs.

What to do? I did research to see if anyone had ideas. It looks like
[you could do such a thing with
Task.current_task()](https://groups.google.com/forum/#!topic/python-tulip/j0cSjUGx8qk)
in asyncio, and that would be fairly equivalent. I think you'd need
careful implementation though... if you're not paying close attention
you might not attach the right things to the right subtask, and that
whole thing just seems... fragile. But it still is a neat idea to play
around with.

But here's some ideas that I think are neat all combined, related to
this problem:

-   The idea that the request or asyncio task is the main object that
    you attach useful variables to, and you just pass that thing around
    as a "universal context" like crazy. (The downside: what happens
    when you aren't using asyncio, or don't need an http request, like
    in a migration script?)
-   I like the idea of the application being multi-instance'able, and
    then having requests and a local context as a layer on top of that.
    So you've got the "instantiated application" layer, and on top of
    that the "current request" layer, and at both levels you can have
    variables attached (like the database engine on the application
    level, but the database session on the request level). That's an
    awesome distinction.

But you don't really know whether or not some bit of code is using an
asyncio task, a web request, or whatever to pass around. Here's the
thing though: it doesn't really matter most of the time. With rare
exceptions, you're just looking for \$OBJECT.db or \$OBJECT.templates or
something. You just need some kind of object you can tack attributes on
to.

So that's my idea in libgoblin/pydraulics: you have an application and
you want to do something with it (handle a request, execute a task,
etc), you can tack stuff onto that object. So either create a fresh
context object to tack stuff onto or just start tacking things onto an
object you have!

Currently, this looks like:

``` python
# Pydraulics -- Easy Pump API integration into your software.
#
# Copyright (C) 2014 Pydraulics contributors.  See AUTHORS.
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU Affero General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
#
# Large parts of this heavily borrowed from MediaGoblin.
# Copyright (C) 2011-2014 MediaGoblin contributors, see MEDIAGOBLIN-AUTHORS

class PydraulicsApp(object):
    """
    Pydraulics "basic" WSGI application
    """

    # ...

    def gen_context(self, request=None):
        """
        Generate or apply a context.

        If we have a request, use that as the context instead.
        """
        if request is not None:
            c = request
            c.app = self
        else:
            c = Context(self)

        c.template_env = jinja2.Environment(
            loader=self.template_loader, autoescape=True,
            undefined=jinja2.StrictUndefined,
            extensions=[
                'jinja2.ext.autoescape'])

        # Set up database
        c.db = self.Session()

        if request is not None:
            c.url_map = url_map.bind_to_environ(request.environ, server_name=self.netloc)
        else:
            c.url_map = url_map.bind(
                self.netloc,
                # Maybe support this later
                script_name=None)
```

Anyway, simple enough. Then you have request.db made, or if you've just
got a command line script and you need the equivalent of that and you
already have your instantiated application, just run
application.gen_context(). Thus, for utilities that are working with
this application and need a variety of instantiated things (the database
and the template engine and so on and so on) it's easy enough to just
accept "context" as the first argument of the function, then use
context.db and etc. (I've considered using just "c" or "ctx" instead of
"context" as the variable name since it seems so common and since it
conflicts a bit with template context and friends, though that's not
very explicit.) So, this seems good.

At one point I got frustrated with the massive amount of porting to
libgoblin I was having to do and thought "I really should probably just
use Django or Flask itself." However, I found that neither framework
really addresses the asyncio stuff I was dealing with above, and once I
got enough ported of libgoblin over, libgoblin-based development is very
fast and comfortable.

That said, it took up enough time working through those things where I
didn't complete the Pump implementation. That's okay, I've got enough to
do what's required on my end from MediaGoblin (and we've got good
direction and help on the federation end this upcoming year where the
most important thing is that I have a good understanding). I still think
pydraulics is a pretty neat idea, and I may finish it, though it'll be
back-burner'ed for now.

However, libgoblin is something I'm likely to extract. I'm convinced
that MediaGoblin is at a point where it's stable enough to know what
works and doesn't about the technical design, so that gives me a good
basis to know what to build from here. There are other applications I'd
like to build which should mesh nicely with MediaGoblin but which really
don't belong as part of MediaGoblin itself, and would be kind of hacky
add-ons. Clearly this is not the most important development, but towards
the end of the summer as we hopefully get the Python 3 branch merged, I
will be looking towards this.

Aside from this, on the "-cation" end of things, I took some time to
relax and also reapproach my health. I may have a separate post on that
soon.

So, that's that. Overall it was productive, but again, not quite in the
ways I was expecting. I feel okay about that though... I wanted to do
some hacking and not feel deeply pressured or stressed about it... if
that wasn't true, I think the "-cation" part wouldn't have held up. So I
feel okay that I wandered a bit, and the other things I worked on /
found I think are important anyhow, and have me much better prepared for
the year ahead. Not to mention the most important part: I feel pretty
refreshed and capable of taking it on!

What's next for the coming week? Well now that this is all over, I'm
organizing plans so we can get rewards out the door and do project
planning for the year ahead. We've got a lot of promises to fulfill.
Better get to it!
